16 research outputs found
Estimating Performance of Pipelined Spoken Language Translation Systems
Most spoken language translation systems developed to date rely on a
pipelined architecture, in which the main stages are speech recognition,
linguistic analysis, transfer, generation and speech synthesis. When making
projections of error rates for systems of this kind, it is natural to assume
that the error rates for the individual components are independent, making the
system accuracy the product of the component accuracies.
The paper reports experiments carried out using the SRI-SICS-Telia Research
Spoken Language Translator and a 1000-utterance sample of unseen data. The
results suggest that the naive performance model leads to serious overestimates
of system error rates, since there are in fact strong dependencies between the
components. Predicting the system error rate on the independence assumption by
simple multiplication resulted in a 16\% proportional overestimate for all
utterances, and a 19\% overestimate when only utterances of length 1-10 words
were considered.Comment: 10 pages, Latex source. To appear in Proc. ICSLP '9
Inclusion of a prosodic module in spoken language translation
Current speech recognition systems mainly work on statistical bases and make no use of information signalled by prosody, i.e. the segment duration and fundamental frequency contour of the speech signal. In more advanced applications for speech recognition, such as speech-to-speech translation systems, it is necessary to include the linguistic information conveyed by prosody. Earlier research has shown that prosody conveys information at syntactic, semantic and pragmatic levels. The degree of linguistic information conveyed by prosody varies between languages, from languages such as English, with a relatively low degree of prosodic disambiguation, via tone-accent languages such as Swedish, to pure tone languages. The inclusion of a prosodic module in speech translation systems is not only vital in order to link the source language to the target language, but could also be used to enhance speech recognition proper. Besides syntactic and semantic information, properties such as dialect, sociolect, sex and attitude etc is signalled by prosody. Speech-to-speech recognition systems that will not transfer this type of information will be of limited value for person-to-person communication. A tentative architecture for the inclusion of a prosodic module in a speech-to-speech translation system is presented